ReduceScatter ================= 对输入数组进行指定类型的分布式归约操作(Reduce),并将结果分散到各输出位置。支持 ReduceSum、ReduceMean、ReduceMax 和 ReduceMin。 输入: - **input_data** - 输入数据地址。 - **data_size** - 数据长度。 - **reduce_type** - 归约类型: - 0: ReduceSum - 1: ReduceMean - 2: ReduceMax - 3: ReduceMin - **core_mask** - 核掩码(仅适用于共享存储版本)。 输出: - **output_data** - 输出数据地址。 支持平台: ``FT78NE`` ``MT7004`` .. note:: - FT78NE 支持fp, dp, int8, int16, int32 - MT7004 支持hp, fp, i16, i32 **共享存储版本:** .. c:function:: void fp_reducescatter_s(float* input_data, float* output_data, int data_size, int reduce_type, int core_mask) .. c:function:: void hp_reducescatter_s(half* input_data, half* output_data, int data_size, int reduce_type, int core_mask) .. c:function:: void dp_reducescatter_s(double* input_data, double* output_data, int data_size, int reduce_type, int core_mask) .. c:function:: void i8_reducescatter_s(int8_t* input_data, int8_t* output_data, int data_size, int reduce_type, int core_mask) .. c:function:: void i16_reducescatter_s(int16_t* input_data, int16_t* output_data, int data_size, int reduce_type, int core_mask) .. c:function:: void i32_reducescatter_s(int* input_data, int* output_data, int data_size, int reduce_type, int core_mask) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 11 #include #include int main() { float *input = (float *)0xA0000000; // 输入在DDR空间 float *output = (float *)0xC0000000; int data_size = 1024; int reduce_type = 0; // ReduceSum = 4; int core_mask = 0xff; fp_reducescatter_s(input, output, data_size, reduce_type, core_mask); return 0; } **私有存储版本:** .. c:function:: void fp_reducescatter_p(float* input_data, float* output_data, int data_size, int reduce_type) .. c:function:: void hp_reducescatter_p(half* input_data, half* output_data, int data_size, int reduce_type) .. c:function:: void dp_reducescatter_p(double* input_data, double* output_data, int data_size, int reduce_type) .. c:function:: void i8_reducescatter_p(int8_t* input_data, int8_t* output_data, int data_size, int reduce_type) .. c:function:: void i16_reducescatter_p(int16_t* input_data, int16_t* output_data, int data_size, int reduce_type) .. c:function:: void i32_reducescatter_p(int* input_data, int* output_data, int data_size, int reduce_type) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 11 #include #include int main() { float *input = (float *)0x10810000; // 输入在L2空间 float *output = (float *)0x10820000; int data_size = 1024; int reduce_type = 0; // ReduceSum = 4; fp_reducescatter_p(input, output, data_size, reduce_type); return 0; }